Bootstrapping Ratfor

Ratfor was bootstrapped by modifying the "Ratfor in Fortran" implementation of Ratfor on the Software Tools tape. The subgroupgfio library was used to handle the reading of ASCII (actually ICL ECMA).

To work with my new I/O library some minor changes to the Ratfor in Fortran bootstrap are needed:

  1. We replace the Fortran-esque file numbers 5 (input) and 6 (output) by 0 (stdin) 1 (stdout) and, where appropriate 2 (stderr).

  2. We don't need the INMAP and OUTMAP functions, and the arrays EXTxxx and INTxxx to convery between the internal and external character sets - this is done by our I/O routines.

  3. The bootstrap OPEN routine becomes:
    C OPEN 
    C                                                                          16120
           INTEGER FUNCTION OPEN(NAME, MODE)                                   16130
           INTEGER NAME(30)                                                    16140
           INTEGER MODE
    
    C - ICL1900: use GFOPEN.
    
    C - Problem.  GFOPEN expects filename in 6 bit character set, but we've
    C   been passed a name in ASCII.  Use a string I/O file to do the
    C   conversion.
    
           INTEGER I, K
           INTEGER F
           INTEGER BUF (8)
    
           INTEGER GFOPENSTR, GFWRITEECMA, GFOPEN, GFCLOSE, GFWRITTEN
    
           LOGICAL EOL(64)
           COMMON /EOL/ EOL
    
           DO 100 I = 1, 30
              IF (NAME (I) .EQ. 10002) GOTO 200
    100    CONTINUE
           I = 31
    
    200    F = GFOPENSTR (-1, BUF(1), 0, 32)
           K = GFWRITEECMA (F, NAME(1), I - 1) 
           K = GFWRITTEN (F)
           I = GFCLOSE (F)
    
           I = GFOPEN (BUF(1), K, MODE, 0)
           IF (I .LT. 0) GOTO 800
    
           EOL (I+1) = .TRUE.
           GOTO 900
    
    800    I = 10001
    
    900    OPEN = I
           RETURN                                                              16190
           END                                                                 16200
    
    We need an extra common array, EOL, to flag that we have reached end of line on an input file. This will be used to force a readline when a new character is read.

    "10001" is the strange value chosen for EOF.

  4. And CLOSE becomes:
    C                                                                           4030
    C CLOSE
    C                                                                           4050
           SUBROUTINE CLOSE(FD)                                                 4060
           INTEGER FD                                                           4070
           CALL GFCLOSE (FD)
           RETURN                                                               4090
           END                                                                  4100
    
  5. The bootstrap GETCH routine becomes:
    C GETCH - GET CHARACTERS FROM FILE                                          8060
    C                                                                           8070
           INTEGER FUNCTION GETCH(C, F)                                         8080
    
           INTEGER GFREADLN, GFREADECMA
    
           INTEGER C, F
    
           LOGICAL EOL(64)
           COMMON /EOL/ EOL
    
           IF (.NOT. EOL (F+1)) GOTO 100
           EOL (F+1) = .FALSE.
    
           C = GFREADLN(F)
    
           IF (C .GE. 0) GOTO 100
    
           C = 10003
           GOTO 900
    
    100    C = GFREADECMA (F)
           IF (C .GE. 0) GOTO 900
           
           C = 10
           EOL (F+1) = .TRUE.
    
    900    GETCH = C
           RETURN
           END
    
    "10003" is the strange value chosen for EOF.

    "10" is the character code for linefeed.

  6. The PUTCH routine translates fairly simply into calls to GFWRITEECMA and GFWRITELN:
    C PUTCH (INTERIM VERSION)  PUT CHARACTERS                                  21140
    C                                                                          21150
           SUBROUTINE PUTCH(C, F)                                              21160
           INTEGER C,F
    
    C - ICL1900 use GFWRITE ECMA and GFWRITELN                                                                            
    
           INTEGER GFWRITEECMA, GFWRITELN
    
           IF (C .EQ. 10) GOTO 100
    
           IF (GFWRITEECMA (F, C, 1) .LT. 0) STOP 'IO'
           RETURN
    
    100    IF (GFWRITELN (F) .LT. 0) STOP 'IO'
           RETURN
    
           END                                                                 21370
    
    Character "10" is linefeed

  7. For the REMARK routine we use the standard ICL Fortran ICOMP fuction to find the "." at the end of the message then GFWRITE6 to write the message.
           SUBROUTINE REMARK(BUF)                                              22410
           INTEGER BUF(100), I                                                 22420
           DO 100 I = 1, 400
              K = -1
              IF (ICOMP (K, BUF(1), I, 1H., 1).EQ.0) GOTO 200
    100    CONTINUE
           I = 400
    200    CALL GFWRITE6 (2, BUF(1), 0, I)
           CALL GFWRITELN (2)
    
           RETURN                                                              22450
           END                                                                 22460
    

Ratfor In Ratfor

Once the Ratfor in Fortran version was working similar chages were made to the Ratfor in Ratfor compiler, which was then able to compile itself.

One supprising problem was that the Ratfor in Ratfor provided on the Software Tools tape was written to use "^=" for the not-equals operator but my modified Ratfor in Fortran expected "!=". This is because the original Ratfor implementation was mapping the "^" character into the "!" character for all input. Since I removed the character mapping stage this was no longer happening. The solution was to modify the Ratfor in Ratfor to use "!=" and change the "gtok" and "relate" functions to allow "^" and "~" as synonyms for "!".

Initial experience and improvements

Although the preprocessor works it is very slow, taking 15 seconds of CPU time to compile itself, where the Fortran compiler takes 2 seconds to compile the resulting Fortran code. The paper "MOUSE4: An improved implementation of the RATFOR preprocessor" by Douglas Comer describes how a faster version was produced.

I decided to modify the Ratfor preprocessor in the ways suggested by the MOUSE4 paper:

  1. Replace the character at a time input and seperate putback buffer of Ratfor by line at a time input with the pushback data stored at the start of the buffer.
  2. Use a state based lexical analyser (scanner) in place of the "ad-hoc" methods used by Ratfor, and keep track of the type of symbol scanned rather than multiple tests for symbol type.
  3. Use a hash table for macro lookup instead of a simple linear lookup.
With these changes the Ratfor preprocessor is around four times faster, the preprocessing time for the editor for example falling from 13 seconds to 3 seconds.

To reduce this time further more detailed performance analysis will be neeeded, I am looking for ways of adding profiling code. (Possibly by adding profiling options to the Ratfor compiler, possibly piggy-backing on the Fortran tracing mechanisms).

The modified preprocessor has some minor language differences from the original:

  1. The define and include instructions are handled at statement level, so it is not possible to do a define in the middle of a statement for example.
  2. The original Ratfor preprocessor "folds" the case of all symbols after searching for macros, mapping everything to lowercase. The problem is that Ratfor re-scans symbols in some contexts, so it can get confused about whether the symbol was uppercase or lowercase.

    For example:

    define(a,b)
    a                 # This outputs "b" as expected
    A                 # But so does this
    A A               # This outputs ba!
    
    The modified preprocessor makes no attempt to emulate this behaviour. It does, however, accept all Ratfor keywords in upper or lower (but not mixed) case.
The modified preprocessor also has some extensions to the language of the original:
  1. The string declaration can be used to define Ascii strings:
    	string TEXT "some text"
    	string BOUNDED (23) "bounded length string"
    
    The string terminator used is the value of the macro EOS at the point the string declaration is used, or the EOS value used for the compilation of the Ratfor preprocessor if none is define by the caller.

    If the caller wants to use the SUBGROUPRAT library included with the preprocessor he should include :LIB.IODEFS to get the EOS value

  2. The define statement can be written as:
    	define WORD text up to EOL
    
    as well as the standard
    	define(WORD,text with nested parens)
    
  3. Numbers can be represented in non-decimal bases in the form base%value, for example 2%10101 or 8%2077 or even 36%1AZW

Availability

A NEWCOPYIN format magnetic tape image containging the source code and instructions for installing it can be found here.